YITP Uji Re search Center 



YITP/U-94-27 
December 1994 



COUNTS-IN-CELLS ANALYSIS 
OF THE STATISTICAL DISTRIBUTION 
IN AN N-BODY SIMULATED UNIVERSE 



Haruhiko Ueda 

Department of Astronomy, Kyoto University, 
Kyoto 606-01, Japan, and 

Uji Research Center 
Yukawa Institute for Theoretical Physics 
Kyoto University, Uji 611, Japan 

E-mail ueda@yisunl .yukawa.kyoto-u.ac.jp 
and 

Jun'ichi Yokoyama 

Uji Research Center 
Yukawa Institute for Theoretical Physics 
Kyoto University, Uji 611, Japan 

E-mail yokoyama@yisunl .yukawa.kyoto-u.ac.jp 

Abstract 

Evolution ol the statistical distribution of density field is investigated by means 
of a counts-in-cells method in a low-density cold- dark-matter simulated universe. 
Four theoretical distributions, i.e. the negative binomial distribution, the lognor- 
mal distribution, the Edgeworth series and the skewed lognormal distribution, are 
tested to fit the calculated distribution function, and it is shown that only the 
skewed lognormal distribution of second and third order can describe the evolution 
of the statistical distribution perfectly well from the initially Gaussian regime to 
the present stage. The effects of sparse sampling is also investigated and it is dis- 
cussed that one should use a sample with number density of galaxies larger than 
~ 0.01/i"^Mpc~"^ in order to recover underlying density distribution. 
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1 Introduction 



It is widely believed today that the observed rich hierarchy of the large scale 
structure in the universe has emerged through gravitational instability of small 
initial inhomogeneity which is usually assumed to obey Gaussian statistics. While 
the statistical properties of Gaussian distribution are fully specified by the two- 
point correlation function, ^2{'>^7t)^ or the power spectrum, even the perturbative 
evolution of fiuctuations in the presence of gravity deforms the statistics as soon 
as second- and higher-order effects are taken into account (see e.g. Peebles 1980), 
not to mention the highly non-Gaussian nature of galaxy distribution today. 

One way to quantify such deviations is to estimate higher-order reduced cor- 
relation functions, which are difficult to measure because they emerge only after 
carefully subtracting contributions of lower-order counterparts. Another quantity 
often used is the void probability, Po[V)^ namely the probability to find no galaxies 
in a volume V . This measure, however, only refiects the Poissonian nature of dis- 
crete distribution for small V and it suffers from a large error for larger V because 
the void probability becomes smaller and smaller as we increase V (Gaztanaga and 
Yokoyama 1993). Thus we should not stick to the void probability but consider 
the probability to find N galaxies, P[N^V)^ as well. This quantity can easily be 
measured by the counts-in-cells method and it contains complete statistical infor- 
mation in the sense that it depends on arbitrary higher-order correlation functions 
averaged over the volume V (White 1979). 

Therefore it is an important issue of theoretical cosmology to clarify the na- 
ture of the statistical distributions of galaxies which sensible models of structure 
formation predict, so that one can compare various scenarios with observations 
and single out the correct model. As well important in understanding evolution of 
the universe is to trace time evolution of the statistical distribution, from initially 
Gaussian state to highly non-Gaussian distribution today. 

In the present paper, using the results of A^-body simulations, we analyze vari- 
ous proposals of theoretical modeling of statistical distribution and phenomenolog- 
ically investigate which model fits the simulation best. There have been proposed 
a number of models of probability distribution function (PDF). The oldest one is 
probably the lognormal distribution which was first applied to galaxy distribution 
by Hubble (1934). The negative binomial distribution, which has the hierarchical 
property in higher-order reduced moments, has also been adopted by a number 
of authors (Fry 1986, Carruthers I99I, Gaztanaga & Yokoyama 1993, Bouchet 
et al. 1993). More recently the Fdgeworth expansion around the Gaussian dis- 
tribution has been proposed to modify it to incorporate higher-order correlations 
(Juszkiewicz et al. 1993). Similar expansion has also been applied to the lognor- 
mal distribution and called the skewed lognormal approximation (Colombi 1994). 
We compare these models with the results of counts analysis of simulated data at 
various epochs. There are, of course, other models of PDF, such as Saslaw's ther- 
modynamic model (Saslaw and Hamilton 1984) or Balian-Schafer's model (1989). 
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We do not consider them here, which have been analyzed rather extensively already 
(Itoh, Inagaki, & Saslaw 1988, Suto, Itoh, & Inagaki 1990, Colombi et al. 1994). 

The rest of the paper is organized as follows. In §2 we summarize properties 
of theoretical distribution we use. Those of simulated data and procedure of the 
analysis are described in §3. The results of counts analysis and comparison with 
theoretical models are given in §4. In §5 dependence of PDF on number density 
of the sample is reported. Finally §6 is devoted to discussion and conclusion. 

2 Theoretical Models of statistical distribution 

Here we summarize properties of four theoretical distributions we consider in 
turn. 

2.1 Negative Binomial Distribution 

Negative binomial (or modified Bose-Finstein) distribution has been used in 
a number of fields with different physical backgrounds such as quantum optics 
(Klauder and Sudarshan 1968), hadronic multiplicity (Carruthers and Shih 1983), 
galaxy counts in a Zwicky cluster (Carruthers and Minh 1983) in addition to the 
analysis of large scale galaxy counts (Fry 1986, Carruthers 1991, Gaztanaga & 
Yokoyama 1993, Bouchet et al. 1993). The distribution has been theoretically 
re-derived by Flizalde and Gaztanaga (1992) in an appropriate manner to our 
problem. That is, this distribution is obtained by incorporating two body correla- 
tion to the Poisson distribution in the simplest manner. The probability to find N 
particles in a cell of volume V is given by 



where N = NiV) is the average number of particles contained in a cell and 



is the amplitude of the J-point correlation function averaged over the volume V . 

One can soon notice that if we take ~^ 0, negative binomial model reduces 
to the Poisson distribution. 







P{N,V) 




(3) 



In the continuum limit N 



00 with X 



N/N constant, the PDF yields 



P(x 




lim 




1 



1 



--1 



--1 




3 



If we further take the hmit 
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0, it approaches Gaussian: 
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(5) 



One of the most important features of negative binomial model is that it obeys a 
hierarchical form of correlation functions. That is, the J-point reduced correlation 
function averaged over volume V can be expressed as 



Cj{V) = Sj [C,{V)] 



J-1 



(6) 



This relation is observationally supported at least for J = 3 and 4 (Peebles 1980, 
Gaztanaga 1992), ahhough such higher order correlation functions suffer from large 
error. For example, Gaztanaga (1992) finds 

S-g = 1.86 ±0.07 (2.01 ±0.13), 
5*4 = 4.15 ± 0.60 (4.96 ±0.88). 

analyzing north CfAI (SSRS) data, respectively. In negative binomial distribution 
the coefficients are given by 

Sj = {J-l)\, (7) 

for both discrete (1) and continuous (4) cases in close agreement with the observa- 
tional analysis by Gaztanaga. Thus this model might describe the entire evolution 
of PDF in the universe, from initial Gaussian regime to present hierarchical regime. 



2.2 Lognormal Distribution 

Another model we consider is the lognormal distribution, which is related with 
a normal distribution 



1 



exp 



[y - y) 



where (Ty are mean value and standard variation of j/, respectively. Substituting 
J/ by In with p = N/ N in the above distribution, we find a lognormal distribution 
function 
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where 



in the continuum limit. It also approaches Gaussian in the limit ^2 ~ 

ln(l ± ^2) ^ ^2- III tlii^ distribution, the n-th order moment of counts is is 

expressed as 

(iV") = iv"(i±o'"'"'". (10) 
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Therefore if lognormal model represents matter distribution correctly, averaged 
two-point correlation function should satisfy the following relation, 

If an average cell does not contain a large enough number of galaxies, we must 
take discreteness effect into account. The PDF in such a case is obtained by the 
formula (White 1979; Fry 1985), 



I 
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Here M{V,t) = (e^*) is the moment generating function for a given volume V, 
which is given for the lognormal distribution by 



OO fD^^ 



.„,,,, , ^ / [ln:t-l,(iV/ym;)l'' -| dx 



We therefore obtain 

P(iV,V)^^ r . _ exp/iV.-e- i^-HN/^.)f 



^! J-^ ^27rln(I + e2) I 21n(I + e2) J 

(14) 

The lognormal matter distribution is obtained from the continuity equation 
in the nonlinear regime but with linear or Gaussian velocity fluctuations (Coles & 
Jones 1 991). Thus it might be adopted as the preliminary model to describe general 
tendency of clustering in the weakly-nonlinear regime. Kofman et al. (1994) has 
shown that the lognormal distribution fits the PDF of A^— body cold-dark-matter 
(CDM) simulation quite well but Bernardeau and Kofman (1994) discussed its 
successful fit just a coincidence due to the particular shape of the CDM power 
spectrum. See also Coles et al. (1993). On the other hand, Bouchet et al. (1993) 
applied it to IRAS galaxy redshift survey and concluded that both the lognormal 
and the negative binomial distributions fit the observed PDF well although the 
latter is somewhat better. 



2.3 Edgeworth series 

As mentioned in introduction, the primordial probability distribution func- 
tion of density fiuctuations is supposedly Gaussian, so the present PDF may be 
expressed by modifying Gaussian distribution to incorporate higher order correla- 
tions which are generated through gravitational clustering. In this way Juszkiewicz 
et al. (1993) proposed to apply the Fdgeworth series to statistics of density fields. 
We consider PDF in terms of p = p{N,V) = 6{N,V)/(j{V) where 6{N,V) = 



5 



[N - N{V))/N{V) and a{V) 
bution. 



{6'^[N^V)). Starting with the Gaussian distri 
1 



exp 



we expand the PDF, P{i^)^ in terms of (f>{i') and its derivatives as 
which is known as Gram-Charher series (Cramer 1946). Here 



(15) 
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and Hf is the Hermite polynomial of degree L For example, the lower-order coef- 
ficients are given by 



Co 
with 



1, Ci 
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0, Q = [-ifSfu'-^ (for 3 < £ < 5), C6 = Sea^ + Sja'' , (19) 
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The detailed expression of S5 and 5*6 are found in Juszkiewicz et al. (1993). Be- 
cause Cq in Gram-Charlier series have another 0{cr^) contribution in addition to 
the (f)^'^'^ term which is (9((T^), we have to rearrange the expansion by collecting 
all terms with the same powers of a. Note that Si = 0((T°) for all £ (Fry 1984; 
Bernardeau 1992). The result of such rearrangement is the so-called Fdgeworth 
series in powers of a. The zeroth- and the first-order Fdgeworth expansion of PDF 
is simply P{i') = <t>{i^) because linear evolution does not alter the Gaussian shape. 
The second-order Fdgeworth approximation reads 



P( 



1 + l^SsaHsi^] 



{2l\ 



and the third-order counterpart is given by 
Piv) 



<t>{v) 



f22l 



Juszkiewicz et al. (1993) have shown that Fdgeworth approximation fits the evo- 
lution of density fiuctuations of A^-body simulations with n = —1 power-law power 
spectrum evolved from Gaussian form for cr< 1/4. 
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2.4 Skewed Lognormal Distribution 



Skewed lognormal approximation, which combines the lognormal distribution 
and the Edgeworth series, has been proposed by Colombi (1994). If we substitute v 
hy v = v{N,V) = $(iV,y)/(7$(y), with $(iV,y) = \np- (In/j), a^{V) = ^fi^) 
and p = N/N in equation (15) and perform the same procedures, we obtain a 
skewed lognormal distribution. It is soon noticed that first-order skewed lognormal 
approximation 



P{N,V) 



N 



;(iV, V) 



InN - (IniV) 
((IniV- (lniV))2) 



(23) 



is nothing but the lognormal distribution. The second-order approximation reads 



P{N,V) = 
and the third-order one is given by 



1 + -^T3aq,H3{v) 



N 



P{N,V) 



N 



where 



TsiV) 



((ln^-(ln^))4)-34 



(24) 



(25) 



(26) 



(27) 



While this distribution also suffers from the same problem of positive non- 
definiteness as the Edgeworth series around the Gaussian distribution in principle, 
the former surpasses the latter in that the expansion parameter (7$ remains smaller 
than unity even when a is considerably larger than one, namely, in the nonlinear 
stage. In fact, using the A^-body simulation of a CDM model, Colombi (1994) 
showed that this approximation also successfully describes the distribution function 
of density fiuctuations in the evolved universe corresponding to the present. 



3 Simulation Data and Counts Analysis 

3.1 N-body Simulation Data 

As a cosmological model we make use of the results of A^-body simulation of 
a CDM model with low density parameter performed by Suginohara and Suto 
(1991) with a positive cosmological constant and primordially scale-invariant fiuc- 
tuation spectrum. More specifically, its physical parameters are taken as fol- 
lows: total particle number A^tot = 64"^ = 262144, the Hubble constant, Hq = 
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lOO/ikms"^ Mpc~^ = lOOkms"^ Mpc~^, present density parameter Oq = 0.2, and 
the cosmological constant, A = A/[3Hq) = 0.8(= 1 — Oq). The simulation was 
carried out with the hierarchical tree code in a cubic volume of with a peri- 
odic boundary condition. The comoving size of the simulation box corresponds to 
Lf, = lOOMpc today, and the mass of each particle to 2 X lO^^Af0, roughly equal 
to a typical galactic mass. The amplitude of initial fluctuations is normalized so 
that the correlation function has the unit amplitude on scale r ~ 5/i~^Mpc at the 
epoch corresponding to the present. 

Detailed analysis of the simulation has shown that this model is among the most 
promising one to reproduce the observed large scale structure (Ueda et al. f993) 
without nontrivial biasing as far as we normalize the amplitude of fluctuations using 
the correlation function. On the other hand, normalizing the amplitude in terms 
of COBE data, Efstathiou et al. (f992) concluded a substantial amount of anti- 
biasing is required for spatially-flat low-density CDM model with scale-invariant 
initial fluctuations. Since our simulation is sensitive to the comoving scale only up 
to fOO/i~^Mpc it could be reconciled with COBE normalization scheme without 
biasing if we would assume a different spectrum of initial fluctuations which has a 
bend on large scale. 

In order to examine the evolution of the statistical distribution, we use position 
data of each particle at four different epoch at a = I.O, 3.0, 5.0, and 6.0, where a is 
the scale factor, which is normalized to unity at the initial epoch, when statistical 
distribution practically obeys Gaussian, and a = 6.0 corresponds to the present. 
Eigure I depicts time evolution of ^2(^7^) ^ function of the radius of a cell. 

3.2 Counts-in- Cells Analysis 

In performing counts-in-cells analysis, we adopt spherical cells. We randomly 
generate 100000 points in the simulation box, which serve as the center of each 
cell, and calculate distance between each point and each galaxy, paying attention 
to the periodic nature of the entire box. Erom these data we can easily reproduce 
the PDE, P(iV, y), for arbitrary volume with Ll/6i^ = 3.8 x lO'^Ll < V < Z^. 
In practice we took V in the range 



As is seen above, we place a fixed number of cells independent of their volume. As 
long as this number is large enough, it does not affect the shape of the estimated 
PDE, although it does affect estimation of errors in the PDE reproduced, which 
we do not work out here. 



2.7 X IO-^iv?< V< 3.3 X IQ-^Ll 






corresponding to the range of the radius of a cell. 



0.04iv6 < r < 0.2Lb. 



(29) 
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4 Fitting evolution of the PDF 



4.1 Negative Binomial versus Lognormal distributions 

We shall now display the results of counts- in- cells analysis first in comparison 
with negative binomial and lognormal distributions. In figures 2 the histogram of 
calculated PDF is depicted for r = 0.04iv6, O.OSiv;,, and 0.2Lb. together with nega- 
tive binomial (solid line) and lognormal (dashed line) distributions with the same 
values of N and seen there both models reproduce PDF reasonably well 

for r = 0.2Lf,. For the cases r = 0.04iv5 and O.OSZfc, however, although lognormal 
distribution is still applicable, significant deviation is observed between negative 
binomial model and the real PDF. Thus we conclude it is not an appropriate model 
for PDF in highly nonlinear regime. 

As a different test of the lognormal distribution, let us examine lower-order 
moments of counts in terms of the predicted relation (If). Figures 3 depict 
^2 = {N^)/N^ - 1 (sohd line) as well as {{N^)/N'^y^^ - 1 (dashed line) and 
((iV^)/]? )i/6 - I (dot-dashed line) function of r at various epochs. All of 

them should agree with each other if the underlying PDF obeys the lognormal 
distribution. As is seen there, these three lines practically coincide with each other 
at the initial epoch where particles are distributed using the Zel'dovich approxi- 
mation. This result is consistent with the analysis of Kofman et al. (1994) where 
they found the PDF derived by the Zel'dovich approximation is very similar to the 
lognormal distribution in the quasi-linear regime. The agreement remains valid on 
large scales up to the epoch a = 6.0. On small scales with 1? however, these 
lines starts to deviate from each other as the nonlinearity increases. 

This suggests that the lognormal distribution, too, is not an adequate model 
to characterize the PDF in highly nonlinear regime and some correction should 
be taken into account. In the next subsection we study Fdgeworth series around 
Gaussian distribution as a preparation of such improvement. 

4.2 Edgeworth expansion around Gaussian distribution 

One can consider the Fdgeworth expansion introduced in §2.3 as a way to obtain 
a PDF which correctly reproduces arbitrary higher-order moments of the count. 
For example, if we adopt (22) it can reproduce both {p^) and as long as we 
insert correct observed values to Ss and S4. Figures 4 are the comparison between 
the calculated PDF and first- (solid), second- (dashed), third-order (dot-dashed 
line) Fdgeworth series, respectively. The first-order one is nothing but the Gaussian 
distribution. Apparently, although the second- and third-order Fdgeworth PDF 
fits the histogram quite well on large scale with r = 0.2^5, significant deviation is 
observed in the later epochs on small scale r = 0.04iv5. As originally stressed by 
Juszkiewicz et al (1993), these PDFs are not appropriate to fit the observed PDF 
in highly nonlinear regime. In particular, the fact that they contain the Hermite 
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polynomials implies that they become oscillatory and not positive definite with 
larger values of a. 

4.3 Skewed Lognormal Distribution and Kolmogorov- Smirnov test 

Having confirmed the expansion around the Gaussian distribution does not 
provide a good approximation, we next consider that around the lognormal distri- 
bution following Colombi (1994). In figures 5, comparison between PDF derived 
from counts- in- cells analysis and skewed lognormal distribution with r = 0.04iv5, 
0.08iv6 and 0.2Lb is found. In each panel, solid, dashed, and dot-dashed lines repre- 
sent first- [i.e. lognormal), second-, and third-order skewed lognormal distributions, 
respectively. In these figures, we find that the second- and the third-order distribu- 
tions reproduce the PDF extremely well for all the cases, better than the lognormal 
model. In fact, the second- and the third-order distributions look degenerate in 
most cases. 

In order to obtain more quantitative conclusion, let us adopt the Kolmogorov- 
Smirnov (KS) test using unbinned data. This test is applicable for cumulative 
distribution function (CDF) C'[N^V) = Ylj=o P{JtV). For comparing observed 
CDF, Cohsi^^y)} with theoretical distributions, Cth(iV, V), such as cumulative 
lognormal or cumulative skewed lognormal distributions, we introduce K-S static 
D as 

D = max |Cobs(iV, V) - Cth(iV, V)l (30) 

which is the maximum value of the absolute difference between the observed CDF 
and the theoretical CDF. The significance level of an observed value of D is given 
approximately by 

P{D > observed) = Qks{\^D), (31) 

where 

oo 

Qks{x) = 2 J2{-y-' exp{-2fx'). (32) 

Here T is the maximum value of counts observed in a cell. Notice that Qks{x) 
is a monotonic function with Qks{^) = 1 and Qks{^) = 0. Therefore if two 
CDF is identical to each other, P[D > observed) becomes unity, and its devia- 
tion from unity represents discrepancy between these CDFs. Table 1 shows the 
results of P[D > observed) between the observed PDF and k—th order skewed 
lognormal models. From this table, we can again see that the second- and third- 
order approximation are superior to the lognormal distribution function for which 
P[D > observed) is considerably smaller than unity for r = 0.04iv6 a = 3.0,5.0, 
and 6.0. 

Thus the KS test shows the fact that the lognormal distribution function does 
not reproduce PDF correctly in highly nonlinear regime, although the deviation is 
not marked. On the other hand, on the scales of our interest the second- and the 
third-order approximation fit the data equally well. We therefore conclude that the 
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second- order skewed lognormal distribution suffices to describe the evolution of the 
statistical distribution from its Gaussian initial condition to the present universe. 

In addition to the KS test, we have estimated normalized skewness, S*, and 
normalized kurtosis, K ^ of count in the skewed lognormal approximation, because, 
unlike in the Edgeworth series around the Gaussian distributions, these quanti- 
ties are not guaranteed to match with the observed values automatically in this 
approximation. They are defined by 

and the results are depicted in figures 6 where solid line represents the observed 
values and short dashed, dot-short dashed and dot-long dashed lines are the re- 
sults calculated from the lognormal distribution, second- and third-order skewed 
lognormal models, respectively. These figures also show that skewed lognormal 
approximation is better than the lognormal distribution and that the second- and 
the third-order approximations are equally satisfactory. 



5 Effects of sparse sampling 

So far we have analyzed the statistical distribution using all of the 262144 particles 
in the simulation and we have fairly been free from the effects of discreteness. In 
the counts analysis of the actual universe, however, we must inevitably use volume- 
limited samples which contains only a limited and rather small number of galaxies. 
Here we analyze the dependence of PDF on the number, A^obs, of total "galaxies" 
we "observe" in the simulation box at the present epoch a = 6.0. In order to 
simulate such a sparsely traced samples, we perform counts analysis using some 
limited portions of particles, which are selected randomly because no information 
is contained in the simulation on the luminosity of each particle. 

Figures 7 depicts the result, where the solid line represents the negative binomial 
distribution, eq. (1), and dot-dashed line stands for the lognormal distribution with 
discreteness effects taken into account, eq. (14). For comparison, the lognormal 
model in the continuum limit, eq. (9), is also depicted in the figures with dashed 
line. As above, three different scales are probed corresponding to r = 4, 8 and 
20/i~^Mpc for which ^2 calculated to be ^2 = 2.9, 0.90, and 0.18 respectively. 
As is seen there for A^obs ^ 3000 both the negative binomial and the lognormal 
distributions fit the data equally well except for the low-count tail of the case with 
(r,A"obs) = (0.08iv6, 3000), but in some cases the negative binomial distribution 
seems to characterize the histogram somewhat better. On the other hand, as A^obs 
is increased significant deviation starts to appear between the negative binomial 
distribution and the calculated PDF especially in the low-count region on scales in 
the quasi-linear or semi-nonlinear regime. 

Note again that presently available volume-limited samples of galaxies contain 
rather small numbers of galaxies. For example, A^obs = 700 roughly corresponds 
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to the number density of galaxies in the samples CfANSO and SSRS80 used by 
Gaztanaga and Yokoyama (1993) and A^obs = 3000 to that in the IRAS sample 
of Bouchet et al. (1993) with radius 39/i~^Mpc. We can hardly distinguish the 
negative binomial and the lognormal distributions out of these samples. We need 
a catalog with larger number of galaxies. 

6 Conclusion 

In the present paper we have investigated evolution of statistical distribution 
in cosmological A^-body self gravitating system. Using the results of a low-density 
CDM simulation, which reproduces various observational data well such as two- 
point correlation function of galaxies and that of clusters without nontrivial biasing, 
we have phenomenologically examined which theoretical model fits the simulated 
PDF best in terms of the counts-in-cells method. As for the theoretical distribu- 
tions, we have considered four different approaches, namely, the negative binomial 
model, the lognormal distribution, Edgeworth series around Gaussian, and the 
skewed lognormal approximation, all of which are theoretically well motivated. 

Our results show that the negative binomial model and Edgeworth series are not 
suitable to fully describe the statistical distribution of underlying density field. The 
failure of the latter in nonlinear regime is a good example of the fact that precise 
knowledge of lower-order moments is far from sufficient to specify the shape of the 
PDE. 

On the other hand, we have found the lognormal distribution matches with the 
observed PDE reasonably well except in highly nonlinear regime. If we improve it 
to the skewed lognormal distribution, the agreement is extended to the nonlinear 
stage as well. The price we must pay for it is additional input parameters Ts, 
etc.. Our analysis, however, has shown that inclusion of only T3 is sufficient to 
obtain a satisfactory approximation to the PDE at least up to the present stage of 
clustering. As emphasized by Colombi (1994), the skewed lognormal distribution 
is also doomed to failure in extremely nonlinear regime. Eortunately, however, as 
far as cosmological statistical distribution probed by galaxies are concerned, this 
would only take place in a distant future. 

Armed with the fact that the underlying statistical distribution in the present 
model is satisfactorily characterized by the (skewed) lognormal distribution, we 
have also investigated the dependence of PDE on the number of galaxies we use to 
calculate it. We have shown that, for the range of Aobs with comparable number 
density of galaxies to typical volume-limited samples available today, there exists 
at least one different distribution that reproduces the calculated PDE well but 
does not refiect the underlying matter distribution properly, namely, the negative 
binomial distribution. Thus the previous conclusions of the observational analyses 
(Gaztanaga & Yokoyama 1993; Bouchet et al. 1993) that this distribution fits the 
observed PDE might have nothing to do with the real statistical distribution of our 
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universe. In fact, to single out the correct statistical distribution we should not 
only probe the PDF on various length scales with different values of ^2 but also use 
volume-limited samples with large enough number density, ngai, of galaxies, say, 
%ai> OMh^Mpc-^. 
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Table 1. Result of Kolmogorov-Smirnov Test 



r 0.04iv6 0.2iv6 

a 1.0 3.0 5.0 6.0 1.0 3.0 5.0 6.0 

k=l 1.00 0.97 0.56 0.29 1.00 1.00 1.00 1.00 

k=2 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 

k=3 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 



KS statistic P{D > observed) for the A;— th-order skewed lognormal distribution 

as a function of r and a. 
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Figure Captions 

Figure 1 : Time evolution ol (^2(^5 ^ lunction ol the radius (in unit ol Lf,) of a cell. 

Figure 2 : Time evolution of PDF in comparison with the negative binomial and the log- 
normal distributions. In each panel, histogram, solid and dashed lines represent PDF, 
negative binomial and lognormal distributions, respectively. Three different scales, 
r = 0.04iv5, O.OSivfc, and 0.2Lf, are represented. 

Figure 3 : Averaged two-point correlations function which are estimated from {N'^) / N — 1 
(sohd line), ((iV3)/iV^)i/3 - 1 (doted line) and ((iV^)/]?^)!/^ - 1 (dashed line). 

Figure 4 : Comparison between PDF P{i') and the Fdgeworth series. Histogram repre- 
sents PDF estimated from counts-in-cells analysis. Solid, dashed and dot-dashed lines 
represent first-, second- and third- order Fdgeworth series. 

Figure 5 : Comparison between PDF and the skewed lognormal distribution. In each panel, 
histogram represents calculated PDF and solid, dashed, and dot-dashed lines represent 
first-, second- and third-order skewed lognormal distributions. The second- and the 
third-order distributions are hardly distinguishable from each other. 

Figure 6 : Skewness and kurtosis of density fiuctuations as a function of V in unit of 
L^. Solid, dashed, dot-dashed and dot-long dashed lines represent skewness or kurto- 
sis which are estimated from the actual PDF, first-, second- and third-order skewed 
lognormal distributions, respectively. 

Figure 7 : Dependence of PDF on the numbers of galaxies, A^obs, which are selected ran- 
domly from the simulation. In each panel, histogram, solid, dashed, and dot-dashed 
lines represent PDF, negative binomial, continuous lognormal, discrete lognormal mod- 
els respectively. For the case (r,A^obs) = (0.04iv5, 700) solid and dot-dashed lines are 
degenerate, while for (r,A^obs) = (0.2iv5, 10000), (0. 2X5, 30000) discrete and continuous 
lognormal distributions coincide with each other. 
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